2023
Press F11 for full screen mode

Plan

  • Introduce the neural network design
  • Calculate logistic regression in R by using glm
  • Implement all calculations in Python
  • Build the same model by using Tensorflow
  • Build the same by using PyTorch
  • Build the same by using PyTorch with Lightning

Artificial Neural Network design

  • The neural network design will be simple;
    one input node, one neuron in one layer.
Figure. Neural network design for logistic regression

Figure. Neural network design for logistic regression

  • The training data \(\{(x_i, y_i) \}_{i=1,2,3,4,5,6}\) are as follow.
    {(2,0), (3,0), (5,1), (7,0), (11,1), (13,1)}.

Idea for Artificial Neural Network design

  • \(x \mapsto z^1_{1}=w^1_{11}x+b^1_1\).

  • \(neuron \mapsto \sigma(z^1_{1})\). Here, \(\sigma=logit^{-1}\).

  • We would like to have a neuron (our only neuron) fires 1 when the input is 5, 11, or 13 and fires 0 when the input is 2, 3, or 7.

  • This needs to be achieved by choosing values for \(w^1_{11}\) and \(b^1_1\). Would it be possible? Unfortunately, no. We will try to choose \(w^1_{11}\) and \(b^1_1\) such that the model performs as close as it can to what is desired.

Logistic regression using R package

training_data <- data.frame(x=c(2,3,5,7,11,13), y=c(0,0,1,0,1,1))
log_reg.res <- glm(data = training_data, y ~ x, family = binomial)
w=coef(summary(log_reg.res))[2,1]
b=coef(summary(log_reg.res))[1,1]

‘glm’ can be used to fit generalized linear models. The family binomial will do the logistic regression.
\(w=w^1_{11}=0.5518\) and \(b=b^1_1=-3.5491\) are found by glm.

x1=seq(0, 13, by=0.2); y1=inv.logit(w*x1+b)
ggplot(data=training_data, aes(x=x, y=y)) + geom_point() + 
  geom_line(data=data.frame(x=x1,y=y1), aes(x=x1, y=y1)) +
  labs(title = paste0("w = ",round(w,4)," b = ",round(b,4)))

Digression - general NN model and langulage

Forward passing (propagation)

Outputs, \(a^{l-1}_k\), from the previous layer (\(l-1\)), come to \(j_{th}\) neuron in Layer \(l\). First, \(z^l_j\) will be calculated by \(z^l_{j} =b^l_j + \sum_k w^l_{jk} a^{l-1}_k\) Deploying vectors and matrices. \[z^l = w^la^{l-1}+b^l\] where \(z^l=[z^l_1\; z^l_2 \, \dots \, z^l_m]^T\), \(w^l=[w^l_{jk}]_{1\leq j\leq m, 1\leq k\leq n}\), \(a^{l-1}=[\sigma(z^{l-1}_1)\; \sigma(z^{l-1}_2) \, \dots \, \sigma(z^{l-1}_n)]^T\), and \(b^l=[b^l_1\; b^l_2 \, \dots \, b^l_m]^T\).

After the last layer, there will be a cost function comparing the last output of the model against the desired output. Searching for \(w\)’s and \(b\)’s that minimize (typically) the cost function is called learning. One popular method for this optimization is gradient descending. For this, we need to find the gradient of the cost function as a function of \(w\)’s and \(b\)’s.
The cost function for the logistic regression that we are building is \[ C(w,b) = -\sum_{i=1}^6 y_i \ln( logit^{-1}(wx_i+b) ) + (1-y_i) \ln(1-logit^{-1}(wx_i+b)).\] It is also called binary cross entropy.

Back propagation

The logistic regression model that we are building as an artificial neural network is simple enough to calculate the gradient just by following the basic gradient formula.
In general, however, it can be challenging as there are more layers and more neurons. For that, there is an algorithmic approach called back propagation. First, Define \(\delta^l_j=\frac{\partial C}{\partial z^l_j}\). Layer \(L\) is the last layer.

  • BP1: \(\delta^L_j=\frac{\partial C}{\partial a^L_j}\sigma'(z^L_j)\)

  • BP2: \(\delta^l=\bigl( (w^{l+1})^T \delta^{l+1}\bigr) \odot \sigma '(z^l) \text{ for } l<L-1. \;\;\odot: \text{Hadamard product}\)

  • BP3: \(\frac{\partial C}{\partial b^l_j}=\delta^l_j\)

  • BP4: \(\frac{\partial C}{\partial w^l_{jk}}=a^{l-1}_k\delta^l_j\)
    Back propagation is a few layers of chain rule calculation. It’s good for implementing step-by-step gradient calculation.

Back propagation

  • Let’s calculate \(\frac{\partial C}{\partial w^L_{jk}}\) first. Combining BP4 and BP1,
    \(\frac{\partial C}{\partial w^L_{jk}}=a^{L-1}_k\delta^L_j =a^{L-1}_k\frac{\partial C}{\partial a^L_j}\sigma'(z^L_j).\) The calculation of all the terms in the right hand side are straight forward because all calculations are either at the last layer or evaluation that is done for forward passing.

  • It is also not challenging to calculate \(\frac{\partial C}{\partial b^L_{j}}\). It’s another last layer calculation.
  • Let’s try \(\frac{\partial C}{\partial w^{L-1}_{jk}}\). \(\frac{\partial C}{\partial w^{L-1}_{jk}}=a^{L-2}_k\delta^{L-1}_j\). \(a^{L-2}_k\) has been calculated already in the forward-propagation. \(\delta^{L-1}_j=\bigl( (w^{L})^T \delta^{L}_j\bigr) \odot \sigma '(z^L_j)\) by BP2 and everything on the right hand side is calculatable based on the calculation at the layer \(L\).
  • In this way, the calculation marches backward to calculate all \(\frac{\partial C}{\partial w^{l}_{jk}}\) and \(\frac{\partial C}{\partial b^{l}_{j}}\).

Back to logistic regression

While our neural network model for logistic regression has one, the last, layer, let’s try to find the back propagation in the calculation.
The cost function, \(C\), is a sum of \(-y\ln(a)-(1-y)\ln(1-a)\) with several \(y\) and \(x\) values as parameters. So, let’s just work with this function as \(C\), that is, \(C(w,b)=-y\ln(a)-(1-y)\ln(1-a)\). Remember that \(z=wx+b\) and \(a=\sigma(z)\). (dropping the negative sign for simplicity)

\(\frac{\partial C}{\partial w}=-y\frac{1}{a}\frac{\partial a}{\partial w}-(1-y)\frac{-1}{1-a}\frac{\partial a}{\partial w}\)
\(\;\;\;\;\;\;=-y\frac{1}{a}\sigma'(z)\frac{\partial z}{\partial w}-(1-y)\frac{-1}{1-a}\sigma'(z)\frac{\partial z}{\partial w}\)
\(\;\;\;\;\;\;=-y\frac{1}{a}\sigma'(z)x-(1-y)\frac{-1}{1-a}\sigma'(z)x\)
\(\;\;\;\;\;\;=-\bigl(y\frac{1}{a}\sigma'(z)+(1-y)\frac{-1}{1-a}\sigma'(z) \bigl)x\)
\(\;\;\;\;\;\;=-\bigl(y\frac{1}{a}\sigma'(z)+(1-y)\frac{-1}{1-a}\sigma'(z) \bigl)a^0_1\)
\(\;\;\;\;\;\;= \frac{\partial C}{\partial z}a^0_1=\frac{\partial C}{\partial z^1_1}a^0_1=\delta^1_1a^0_1.\;\;\) This is BP4.

continued

\(\frac{\partial C}{\partial b}=-y\frac{1}{a}\frac{\partial a}{\partial b}-(1-y)\frac{-1}{1-a}\frac{\partial a}{\partial b}\)
\(\;\;\;\;\;\;=-y\frac{1}{a}\sigma'(z)\frac{\partial z}{\partial b}-(1-y)\frac{-1}{1-a}\sigma'(z)\frac{\partial z}{\partial b}\)
\(\;\;\;\;\;\;=-y\frac{1}{a}\sigma'(z)-(1-y)\frac{-1}{1-a}\sigma'(z)\)
\(\;\;\;\;\;\;=\delta^1_1,\)
this is BP3.

Implementing the computation for neural network learning by using python code

Tensorflow model setup

The implementation using basic python computation codes can be complicated as there are more layers and more neurons. Let’s now look at the usage of tensorflow package. As it is shown here, one layer can be added by one extra line and the number of neurons will not add any complexity to the code.

Tensorflow model fitting

The results are the same (okay, almost the same).

PyTorch model setup

PyTorch model fitting

PyTorch with Lightning

Lightning training